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Notice of Allowability 



Application No. 

10/014.742 



Examiner 

Hoang-Vu A. NguyervBa 



Applicant(s) 

HEEB. BEAT 



Art Unit 

2192 



- The MAILING DATE of this communication appears on the cover sheet with the correspondence address-- 

All claims being allowable, PROSECUTION ON THE MERITS IS (OR REMAINS) CLOSED in this application. If not included 
herewith (or previously mailed), a Notice of Allowance (PTOL-85) or other appropriate communication will be mailed in due course. THIS 
NOTICE OF ALLOWABILITY IS NOT A GRANT OF PATENT RIGHTS. This application is subject to withdrawal from Issue at the initiative 
of the Office or upon petition by the applicant. See 37 CFR 1.313 and MPEP 1308. 

1 . H This communication is responsive to Amendment filed December 13. 2004 , 

2. S The allowed claim(s) is/are 1-18,20-22 and 24 , 

3. □ The drawings filed on are accepted by the Examiner. 

4. □ Acknowledgment is made of a claim for foreign priority under 35 U.S.C. § 119(a)-(d) or(f). 

a) □ All b) □ Some* c) □ None of the: 

1. □ Certified copies of the priority documents have been received. 

2. □ Certified copies of the priority documents have been received in Application No. . 



3. □ Copies of the certified copies of the priority documents have been received in this national stage application from the 
International Bureau (PCT Rule 17.2(a)). 
* Certified copies not received: . 

Applicant has THREE MONTHS FROM THE "MAILING DATE" of this communication to file a reply complying with the requirements 
noted below. Failure to timely comply will result in ABANDONMENT of this application. 
THIS THREE-MONTH PERIOD IS NOT EXTENDABLE. 

5. □ A SUBSTITUTE OATH OR DECLARATION must be submitted. Note the attached EXAMINER'S AMENDMENT or NOTICE OF 

INFORMAL PATENT APPLICATION (PTO-152) which gives reason(s) why the oath or declaration is deficient. 

6. IE CORRECTED DRAWINGS ( as "replacement sheets") must be submitted. 

(a) □ including changes required by the Notice of Draftsperson's Patent Drawing Review ( PTO-948) attached 

1 ) □ hereto or 2) □ to Paper No./Mail Date . 

(b) IS including changes required by the attached Examiner's Amendment / Comment or in the Office action of 

Paper No./Mail Date : 

Identifying Indicia such as the application number (see 37 CFR 1.84(c)) should be written on the drawings In the front (not the back) of 
each sheet. Replacement sheet(s) should be labeled as such in the header according to 37 CFR 1.121(d). 

7. □ DEPOSIT OF and/or INFORMATION about the deposit of BIOLOGICAL MATERIAL must be submitted. Note the 

attached Examiner's comment regarding REQUIREMENT FOR THE DEPOSIT OF BIOLOGICAL MATERIAL 



Attachment(s) 

1. 13 Notice of References Cited (PTO-892) 

2. □ Notice of Draftperson*s Patent Drawing Review (PTO-948) 

3. S Information Disclosure Statements (PTO-1449 or PTO/SB/08), 

Paper No./Mail Date 11/22/04 

4. □ Examiner's Comment Regarding Requirement for Deposit 

of Biological Material 



5. □ Notice of Informal Patent Application (PTO-152) 

6. □ Interview Summary (PTO-41 3), 

Paper No./Mail Date . 

7. 13 Examiner's Amendment/Comment 

8. 13 Examiner's Statement of Reasons for Allowance 
9. 13 Oiher App'd Drawinas&Fax Sheet 



U.S. Patent and Trademark Office 
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Continuation of Attachment(s) 9. Other Approved Drawings & Fax Cover Sheet. 
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Art Unit: 2192 

EXAMINER'S AMENDMENT and 
EXAMINER'S STATEMENT OF REASON FOR ALLOWANCE 

1. This action is responsive to amendment filed December 13, 2004. 

Response to Amendments 

2. Per applicant's request, claims 1-19 have been amended; new claims 20-24 have 
been added. Claims 1-24 are pending. 

3. The objection to the drawings made in the previous office action is withdrawn 
in view of Applicants' amendments to the drawings to correct the identified 
informalities. Copies of the approved corrected drawings with the Examiner's initial 
are attached to this Office action. However, corrected drawings labeled as 
"Replacement Sheet(s)" must be submitted. 

INFORMATION ON HOW TO EFFECT DRAWING CHANGES 
Replacement Drawing Sheets 

) 

Drawing changes must be made by presenting replacement sheets which incorporate 
the desired changes and which comply with 37 CFR 1.84. An explanation of the 
changes made must be presented either in the drawing amendments section, 6r ^ 
remarks, section of the amendment paper. Each drawing sheet submitted after the 
filing date of an application must be labeled in the top margin as either "Replacement 
Sheet" or '^ew Sheet" pursuant to 37 CFR 1.121(d). A replacement sheet must 
include all of the figures appearing on the immediate prior version of the sheet, even 
if only one figure is being amended. The figure or figure number of the amended 
drawing(s) must not be labeled as "amended." If the changes to the drawing figure(s) 
are not accepted by the examiner, applicant will be notified of any required corrective 
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action in the next Office action. No further drawing submission will be required, 
unless applicant is notified. 

Identifying indicia, if provided, should include the title of the invention, inventor's 
name, and application nximber, or docket number (if any) if an application number 
has not been assigned to the application. If this information is provided, it must be 
placed on the front of each sheet and within the top margin. 

Timing of CoiTections 

Applicant is required to submit acceptable corrected drawings within the time period 
set in the Office action. See 37 CFR 1.85(a). Failure to take corrective action within 
the set period will result in ABANDONMENT of the application. 
If corrected drawings are required in a Notice of Allowability (PTOL-37), the new 
drawings MUST be filed within the THREE MONTH shortened statutory period set 
for reply in the 'TMotice of AllowabiKty.'^ Extensions of time may NOT be obtained 
under the provisions of 37 CFR 1.136 for filing the corrected drawings after the 
mailing of a Notice of Allowability. 

4. The objection to the specification is withdrawn in view of Applicants' 
amendments to the specification to correct minor informalities. 

5. The objection to claims 2, 6 and 9 is withdrawn in view of Applicants' 
amendments to these claims to correct minor informalities. 

6. The rejection of claims 3-19 under 35 U.S.C. § 112, first paragraph, as failing to 
comply with the written description requirement is withdrawn in view of Applicants' 
amendments to the specification to clarify the limitations "using information from 
preceding instmctions to mimic an optimizing compiler" and *'to mimic an optimizing 
compiler" recited in the claims. 
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7. The rejection of claims 2-19 vmder 35 U.S.C. § 112, second paragraph, as being 
indefinite is withdrawn in view of Applicants' amendments to these claims to provide 
proper antecedent basis to the identified terms. 

8. The provisional obviousness-type double patenting rejection of claims 1-19 
over claims 1-29 of copending Application No. 10/016,794 is withdrawn in ^ew of 

Applicants' filing of a terminal disclaimer. 

9. The rejection of claims 1 and 3-8 under 35 U.S.C. § 102(a) is withdrawn in view 
of Applicants' amendments to these claims to incorporate the identified allowable 
subject matter. 

Examiner^s Amendment 

10. An examiner's amendment to the record appears below. Should the changes 
and/or additions be unacceptable to applicant, an amendment may be filed as 
provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST 
be submitted no later than the payment of the issue fee. 

Authorization for this examiner's amendment was given in a telephone 
interview with William F Ahmann, Reg. No. 52,548 on May 20, May 26, and June 3, 
2005. 

The application has been amended as follows: 
a. Claim 3: 

i. in line 11, after "said native machine code", delete the period and 
insert — ; and — 

ii. in line 12, insert — producing said native machine code in a single 
sequential pass in which information from preceding instmction translation is 
used to perform the same optimizing process of an optimizing compiler 
without the extensive memory and time requirements. — 
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b. Cancel Claim 19 

c. Claim 9: in line 5, after "setting" delete '^said" 

d. Claim 13: in line 4, after "method duplicating or reordering" delete 
"said" and insert - a — 

e. Claim 14: in line 4, after "emitting native machine code using", delete 

"said" 

f. Claim 15: in line 4, before "stack mapping information", delete — said ~ 

g. Claim 21: replace claim 21 ^age 15 of Amendments to the Claims) with 
the following: 

21. (currently amended) A computer implemented method^ compfising: 
processing a first bytecode of a sequence of bytecodes; 

processing a second bytecode of the sequence of bytecodes using information 
associated with resulting from the processing of the first bytecode; and 

producing optimized native machine code in a single pass through the sequence 
of the bytecodes , using preceding translation information to optimize the native 
machine code . 

h. Cancel Claim 23. 

Examiner^s Statement of Reasons for Allowance 

11. Claims 1-18, 20-22 and 24 are allowed 

12. The following is an examiner's statement of reasons for allowance. 

The prior art of record, i.e., applicant's admitted prior art (APA), teaches 
attempts to improve compilation speed by compilation during idle times and by pre- 
verification. However, APA fails to teach or suggest improving compilation speed 
using a second code segment that creates optimized machine code from bytecodes 
received by a first code segment in a single sequential pass in which infomiation 
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from preceding instruction translation is used to perform the same optimiziiig 
process of an optimized compfler without the extensive memory and time 
requirement, as recited in the amended independent claims 1, 3 and 20. 

Furthermore, APA does not teach or suggest the limitations recited in claim 2. 

Moreover, a copy of a newly found reference, which is now made of record, 
i.e., Azevedo-Nicolau-Hummel, Java™ Annotation- Aware Just-in-time (AJIl) 
Compilation System ("AJIT") was faxed to Applicants on June 1, 2005 for review. 
On June 2 and 3, 2005, Applicants' representative and the examiner discussed over 
the telephone the claim language of Claim 21 in light of the teachings of AJIT. 
Agreement was reached that by amending the claim language to read "using 
information resulting from the processing of the first bytecode" instead of "using 
information associated with the processing of the first bytecode," Claim 21 would 
distinguish over the teachings of AJIT. 

AJIT presents an alternative to an optimizing JIT compiler that makes use of 
code annotations generated by ajava'^'^ front-end (i.e., Java™ compiler, a.k.a., Javac). 
These annotations carry information concerning compiler optimization. During the 
translation process, an annotation-aware JIT (AJIT) system then uses this information 
to produce high-performance native code without performing much of the necessary 
analyses or transformations. The process of producing native code from an 
annotated Java™ b5^ecodes is done in a single pass over the bytecode stream. As 
each bytecode and its annotations bytes are read, a corresponding Kaffe IR operation 
is generated. The generated Kaffe IR operation depends on the information provided 
by the annotations. This information may suggest that the bytecode translation be 
entirely skipped, or that some sub-operations be eliminated or simplified. AJIT, 
however, fails to teach or suggest that processing a second bytecode of the sequence 
of bytecodes using information resulting from the processing of the first 
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bytecode and producing optimized native machine code in a single pass through the 
sequence of b3^ecodes, using preceding translation information to optimize the 
native machine code (instant claims 21, 1 and 3). The Kaffe IR operation in AJIT 
depends on the information provided by the annotations that are associated with the 
bytecode that is currently processed as opposed to the information resulting from 
the processing of the preceding bytecode (emphasis added) as claimed in the present 
invention. 

Any comments considered necessary by applicant must be submitted no later 
than the payment of the issue fee and, to avoid processing delays, should preferably 
accompany the issue fee. Such submission should be cleady labeled "Cotnments on 
Statement of Reasons for Allowance." 

13. Any inquiry concerning this communication or earlier commimications from 
the examiner should be directed to Hoang-Vu "Anton/' Nguyen-Ba whose telephone 
number is (571) 272-3701. The Examiner can normally be reached on Tuesday- 
Friday, 7:15 to 17:15. 

If attempts to reach the Examiner by telephone are imsuccessful, the 
Examiner's supervisor, Tuan Dam can be reached at (571) 272-3695. The fax phone 
number for the organization where this application or proceeding is assigned is 703- 
872-9306. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR 
only. For more information about the PAIR system, see http://pair-direct.uspto.gov. 
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Should you have questions on access to the Private PAIR system, contact the 
Electronic Business Center (EBC) at 866-217-9197 (toll-free). 
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Abstract 

The Java Bytecode language lacks ftxpressivenpi5s for 
t.radifcional compiler optimizations; making this portable, 
secure software distribution format inefficient as a pro- 
gram representation for high performance. This ineffi- 
ciency results from the underlying stack model, as well 
as the fact that many bytecode operations intrinsically 
include sub-operations (e.g.. iaload includes the ad- 
dress computation, array bounds checks and the actual 
load of the array element). The stack model, with no 
operand registers and limiting access to the top of the 
stack, prevents the reuse of values and bytecode reorder- 
ing. Tn addition, the bytecodes have no mechanism to 
indicate which sub- operations in the bytecode stream 
are redundant or subsumed by previous ones. As a con- 
sequence, the Java Bytecode language inhibits the ex- 
pression, of important compiler optimizations, including 
common sub-expression elimination, register allocation 
and instruction scheduling. 

The bytecode stream generated by the Java front- 
end is a significantly under-optimized program repre- 
sentation. The mosti common solution to overcome this 
aspect of the language is the use of a Just-in-Time (JTT) 
compiler to not only generate native code, but perform 
optimization as well. TTowever. the latter is a time con- 
suming operation in an already time-constrained trans- 
lation process. Tn this paper we present an alternative 
to an optimizing JTT compiler that makes use of code 
annofcafcions generated by the Java front-end. These an- 
notations carry information concerning compiler opti- 
mization. During the translation process, an annotation- 
aware JTT (AJTT) system then uses this information 
to produce high-performance native code without per- 

•This work supported in part, by CAPRS. 



forming much of the necessary analyses or transforma- 
tions. We describe the implementation of the first pro- 
totype of our annotation-aware JTT system and show 
performance results comparing our system with other 
Java Virtual Machines (JVMs) running on SPARC ar- 
chitecture. 

1 Introduction 

The Java Bytecodes are emerging as a software distri- 
bution language for both \is portability and safety fea- 
tures. The portability property of the language is en- 
sured by the platform-independent stack machine model 
targeted by Java compilers. On the target machine, 
this intermediate code representation is either inter- 
preted [17]. or compiled into native code using tradi- 
tional ahead-of-time [14] or just-in-time compilers [1, 
2, 16, 18, 24]. The safety features of the language are 
based on the security violation checks performed at load 
and run-time [11]. Such checks include enforcement 
of methods and variables access modifiers, | strict type- 
checking and array bounds checking. Many of these 
checks are implicit in the bytecodes, forcing the JVM 
to perform them unless it can prove at load- time (via 
analysis) that the checks are unnecessary. 

Tn the design of the Java Bytecxide language, a great 
deal of effort was spent to make it secure anfl portable. 
TTowever, in order to be widely accepted, it must also 
yield efficient execution on a wide range of machine ar- 
chitectures. Unfortunately this is the weakest aspect 
of the language and is currently the focus of much re- 
search. The inefficient execution of Java Bytecode pro- 
grams lies with the definition of the bytecodes them- 
selves. The language is poor for conveying the result of 
many common and important compiler optimizations 
that are traditionally expressed in the native code gen- 
erated by optimizing compilers. The direct translation 
of a bytecode stream generated by a Java front-end into 
target machine code results in low-quality code. 

Th e fi rst I i m i t a ti on i n expr essi ng com pi I er op ti m i za- 
tion is the stack model of the Java Bytecodes. This 



Java Code 



public static void foo(int at], int b(), int offsetl, 
for tint i-0; i<a. length; i+t) 

a[i] - b[i] + offsetl + o£f3et2; 

) 



int of£set2>{ 



IR 



Optimized IR 



Optimized Bytecode 



1 : smovi 0 , i 

2 : aadd a, "arraySizeOf f set" , Tl 

3 : ild (TX), T2 

4 : icmpge i, V2, T3 

5 : br T3 (18) 

6 : ishl i, "ishift", TS 

7 : iadd T5, "arraySizeOf f set", T6 

8 : aadd b, T6, T7 

9 : ild (T7), T4 

10 : iadd T4, offsetl, T8 

11 : iadd T8, offset2, T9 

12 : ishl i, "ishift", TIO 

13 : iadd TIO, "arraySizeOf fset", Yll 

14 : aadd a, Til, T12 

15 : ist T9, (T12) 

16 : iadd i, 1 , i 

17 : jmp (2) 
16 : return 



1 : iadd offsetl, offset2, Tl 

2 : smovi 0 , i 

3 : aadd a, "arraySizeOf f set", 

4 : ild (T2), T3 

5 : icmpge i, T3, T4 

6 : br T4 (16) 

7 : ishl i, "ishift", T6 

8 : iadd T6, "arraySizeOf f set" 

9 : aadd b, T7, T8 

10 : ild (T8), T5 

11 : iadd T5, Tl, T9 

12 : aadd a, T7, TIO 

13 : ist T9, (TIO) 

14 : iadd i, 1 , i 

15 : jmp (5) 

16 : return 



0 iload_2 

1 iload 3 

2 iadd ~ 

3 istore 5 

5 aload 0 

6 arrayTength 

7 istore 6 
9 icons t_0 

10 istore 4 
12 goto 29 

15 aload_0 

16 iload 4 

18 aload_l 

19 iload 4 

21 iaload 

22 iload 5 

24 iadd 

25 iastore 

26 iinc 4 1 
29 iload 4 
31 iload 6 

33 if_icmplt 15 
36 return 



Figure 1 : Java By decodes as a language for program repreisentation 



model provides no registers and restricts access to only 
the top element of the stack. Restricting access to the 
top of the stack prevents the reordering of bytecodes, a 
necessary transformation during instruction scheduling. 
And without registers to hold values, the stack model 
sequentializes computation and prevents the reuse of 
values (since again, only the top is accessible). Obvi- 
ously, the lack of registers also prevents the expression 
of regisf,er allocation, a critical and potentially time- 
consuming optimization. 

The second limitation of the Java Bytecodes as a 
program representation is the fact that many bytecodes 
intrinsically encapsulate many machine sub-operations 
(e.g., iaload includes the address computation, array 
bounds checks and the actual load of the array element). 
The Java front-end can detect when sub-operations are 
redundant or subsumed by preceding sub-operations, 
and can try to apply traditional code- improving trans- 
formations in order to eliminate these sub-operations. 
TTowever. the compiler is still limited by the stack-based 
nature of the Java Hytecodes, in which sub-operations 
cannot easily be separated, eliminated or rearranged. 
Furthermore, there is no mechanism in the language to 
disable sub-operations when deemed unnecessary. For 
this reason, straigh forward compiler optimizations such 
as common sub-expression elimination, array bounds 
check elimination and loop-invariant code removal have 
limited expressiveness in Java Bytecodes, 

To demonstrate these limitations, consider the ex- 
ample in Figure 1 . This example assumes that a RJSC- 
like, three address code Intermediate R/^presentation 
(TR.) is used in the Java front-end to bytecode com- 



piler. The leftmost column vshows the unoptimized IR^ 
corresponding to the Java code at the top of Figure 1, 
The middle column shows the result of performing some 
simple optimizations, such as loop invariant removal of 
expression offsetl + off set 2 and the array size ref- 
erence. After optimizing this TR, the compiler is then 
able to produce the optimized bytecode stream shown 
in the last column. However, additional optimizations 
are possible that cannot be expressed in the final byte- 
code. For example, the sub-operations comprising array 
element accesses represent common sub-expressions and 
thus one could be eliminated (the index is the same for 
accessing the integer arrays a and b and therefore the 
array index computation in Vines 6-7 and 12-13 in the 
leftmost column are redundant). Likewise, given the 
bounds on the loop, all array bounds checks involving 
a are unnecessary (and those involving b could be re- 
duced to a single check before the loop starts). Clearly, 
the resulting bytecode has room for improvement. 

The implication is that even though the Java front- 
end can compile a program into a clean and optimized 
sequence of bytecodes, a JTT compiler will still need 
to perform significant optimization in order to gener- 
ate high-quality native code. This in turn implies that 
a JTT compiler will have to perform bytecode analy- 
sis to extract information about the program for the 
purposes of optimization. This introduces a potentially 
significant overhead in an already time-constrained JTT 
system. Tn this paper we present an alternative to 
the traditional optimizing JTT compiler based on byte- 
code annotations. Tn our annotation-aware JTT (A JTT) 
compilation system, the translation of bytecodes into 

^ Array bound cherks have been omitted. 



high-performance native code is accomplished wif.h t,he 
help of extra analysis information carried along with 
the hytecodes in the form of annotations. Onr idea 
of Java Bytecode annotations was first introduced in 
[15]; in this paper we present t,he details of the imple- 
mentation our annotation-aware JIT system. In partic- 
ular, we show how annotations are effective in carry- 
ing information concerning register allocation, common 
sub-expressions and value propagation. We also present 
some initial results on the performance of the code gen- 
erated by our AJTT system, demonstrating that onr ap- 
proach outperforms other JVM implementations on the 
SPARC architecture. 

The format of this paper is as follows. Tn the next 
section we present the structure of our annotation gen- 
erating Java fronti-end and discuss the types and for- 
mats of the annotations implemented in our first proto- 
type. We also provide details concerning our compile- 
time register allocator that produces the annotations in 
support of dynamic register allocation. Tn Section 3 we 
discuss our annotation-aware JTT (AJTT) system and 
show how it uses annotations to implement run-time 
register allocation and produce native code. Next, in 
Section 4 we discuss related work. Finally, in Section 
5 we present some preliminary results on the perfor- 
mance of our AJTT system, followed by our conclusions 
and discussion of future work in Section 6. 

2 Annotation-Generating Compilation 
System 

The idea of annotating a program representation with 
analysis information produced by a front-end compiler 
stems from the need to reduce the workload of run- time 
code optimizing systems. We have chosen to annotate 
the Java Bytecode representation, given its commerical 
success and widespread availability induced by its write- 
once-run-anywhere capability. TTowever, the concept of 
annotations is a general one, and thus can be applied 
to any program representation. 

Our annotation types and formats vary with the 
kind of information that needs to be rx)nveyed to the 
run-time code optimizing system. For example, it may 
consist of high-level program information that is not 
expressible in the lower-level program representation, 
or com pile- time analysis information that is too time 
consuming to produce at run-time. Figure 2 gives an 
overview of a general annotation-generating compila- 
tion system with a number of diflVirent annotations that 
we are currently working on. During the initial Java to 
bytecode translation, our annotation-generating com- 
piler behaves as a traditional compiler. Tt builds a 
three-address code intermediate representation flexible 
enough to represent all the sub-operations that form 
each bytecode. On this TR traditional code-improving 



techniques (e.g., copy propagation, common sub expres- 
sion elimination, loop invariant code removal and reg- 
ister allocation) are applied and an optimized TR, pro- 
duced. Once this stage has been reached, each oper- 
ation (or sequence of operations) is translated into an 
optimized stream of Java T^ytecodes. Next, an annota- 
tion generator also reads the optimized TR, along with 
the data provided by various compiler analyses, and 
produces a set of annotations. Finally, the compiler 
performs a mapping phase in which the bytecode op- 
erations are paired with their corresponding TR. opera- 
tions and annotations, and then stores the annotated 
bytecode into the appropriate class file. 

For example, in the case of the Virtual B,egister Al- 
location (VRA) annotations (to be explained shortly), 
each bytecode is annotated with the source and desti- 
nation registers allocated to the operands of that Java 
TR, operation. Then, the bytecode stream is copied into 
the code attribute section of the class file together with 
the annotations, the latter being stored as an extra code 
attribute. Storing annotations in this way guarantees 
backward compatibility with existing JVMs, which by 
definition must ignore unknown code attributes [11]. 




Figure 2: Annotation generating compiler and annota- 
tion aware JTT (AJTT) system 

Our annotation-generating compiler was built on the 
freely available Java Fiytecode compiler gnavac version 
0.3.1 [22]. rVom the Java source code, this compiler 
generates a parse tree and produces bytecodes. We aug- 
mented the compiler by (a) introducing functions for 
building and manipulating a three-address code TR, (b) 
implementing compiler optimizations for common sub- 
expression elimination, copy propagation and virtual 
register allocation, and (c) designing a Virtual R.egister 
Allocation annotation generator. This paper focuses in 
particular on the VRA annotations. The remaining an- 
notations, as presented in Figure 2, are discussed in [15] 
or represent future work (see Section 6). 



Virtual R.egist;er Allocation annotatiorjs represent the 
result of performing register allocation assuming an in- 
finite number of virtual registers. The information pro- 
vided by the VRA annotations is then used by the JIT 
compiler to perform a fast and efficient dynamic regis- 
ter allocation and also to indicate which bytecodes (or 
bytecode sub-operations) are redundant ^ or subsumed 
by preceding operations; such operations need not be 
translated into native code. TTow the JTT compiler in- 
terprets these annotations, does register allocation, and 
produces native code is explained in Section 3. In the 
remainder of this section we discuss the format of VRA 
annotations and how the Java front-end compiler pro- 
duces them. 

Each instruction defined in the Java Bytecode lan- 
guage is mapped into operations in our Java TR„ An- 
notations for virtual register allocation basically hold 
information on the operands of the Java TR operations. 
The VRA annotations represent source operands, des- 
tination operands, and any intermediate valupi? implic- 
itly calculated by the bytecode sub-operations (e.g., ar- 
ray index calculation in an array load operation). For 
each bytecode instruction one or more VR.A annotation 
formats exist. Each format indicates how a particu- 
lar bytecode sub-operation should be translated: where 
to read its input operands, where to write the result, 
and perhaps whether or not this operation should be 
skipped entirely (e.g. when a previous operation has 
already computed the needed value). 

Figure 3 shows an example of correspondence be- 
tween bytecodes, Java TR and VRA annotations for- 
mats. Each SRC, EXTRA and DEST fields hold virtual 
register numbers representing the operands for the sub- 
operations. Tn Case 1 of Figure 3, the Java \R code 
sequence for the computation performed by the byte- 
code iaload is illustrated. The most general format of 
an iaload operation includes 2 SRC fields. 2 EXTRA fields 
and one DEST field with SRC-SRC-EXTRA-EXTRA-DESTas 
annotation header format. The first SRC field represents 
the virtual register that holds the array object reference; 
the second SRC field represents the virtual register that 
holds the index; the first EXTRA field represents the re- 
sult of the array index calculation; the last EXTRA field 
represents the result of the array address calculation: 
and the DEST field represents the virtual register hold- 
ing the array element read from memory. Tf the address 
computation has already been computed, as in Figure 3 
Case 2, the header SRC-DEST indicates that the SRC field 
holds the array element address and DEST field is the 
suggested virtual register to hold the value read from 
memory, meaning that the translation process can skip 
the sub-operations for array index and address calcula- 
tion and the bytecode iaload can be translated into a 

^ Afi diRCimsed earlier, redundant; bytecodes appear in the optimized 
bytecode stream due to the stack machine model. 



single load operation. 



Case 1: Array elfiment address caloilation and array load 



Bytecode 


JavaIR 


iaload 


VO holds array address 
Vl holds index 

1 : ishl VI, "ishift", V2 

2 : iadd V2, "arraySizeOf f set" , V2 

3 : aadd VO, V2, V3 

4 : ild (V3) , V4 


Annotated Bytecode 

opcode SRC SRC EXTRA EXTRA DEST 
iaload VO VI V2 V3 V4 



Case 2; Anay load 



Bytecode 


JavaIR 


iaload 


VO holds array element address 
4 : ild (VO), VI 


Annotated Bytecode 
opcode SRC DEST 
iaload VO VI 



Figure 3: Example of VRA annotations for iaload op- 
eration 

Tn Figure 4. we show how local variables and class 
member variables are represented in our Java TR,. TiOcal 
variables are directly mapped to virtual registers. T.o- 
cal variable accesses (e.g, iload and istore) are repre- 
sented in oiir Java TR. as nop operations or move oper- 
ations, annotated as SRC-DEST, CONST-DEST. CONST or 
SRC, depending on the result of optimizing the Java TR 
via copy propagation. When the JTT interprets the an- 
notation formats SRC or CONST, it has the information 
that either (a) the local variable is in a virtual register 
indicated by the byte following the format header, or 
(b) it is a constant. Tn both cases, no machine code is 
generated for the bytecode. Class member variables are 
kept as variables in memory in our front-end compiler 
and accessed via load and store operations, as shown in 
Figure 4 for bytecodes get static and getf ield. As 
a consequence, these variables are also kept in memory 
in our A JTT system. To enable some optimization on 
accesses to class member variables, we devised annota- 
tions that make explicit the variable address calcula- 
tion, just like those in array references. For example, 
bytecode getf ield has the different annotation formats 
SRC-DEST and EXTRA- EXTRA- EXTRA-D EST which state 
whether or not the variable's address has already been 
computed. 

The choice of which virtual register to hold an op- 
eration's operands is crucial to the register allocation 
done at run-time. Tn order to enable a fast and effi- 
cient dynamic register allocation, the VRA annotations 
must convey the order in which variables should be al- 
located to physical registers (and thus which should be 
spilled if necessary). This is accomplished by assigning, 
at com pile-time, the lowest virtual register numbers to 
the mast important variables in the code. Then, at 
run- time, the register allocator should assign the lowest 





JavaIR 


VRA Annotaticm Ptmnatt 


iload 


nop 


SRC 




imov VI, V2 


SRC DEST 


i store 


imov const« VI 


CONST DEST 










amovi ""addressOfCl ass Field", VI 
(b,c,s,i,lfd,£,aUd m) , V2 


EXTRA DEST 


getstatic 


{b,c,s,iplrd,f,aUd (VI), V2 


SRC DEST 




nop 


SRC 




amovi "of faetOf Field", V2 
aadd VI, V2, V3 
{b,c«s,i,l,d,f,a)ld (V3) , V4 


SRC EXTRA EXTRA DEST 


get field 


amovi '^offsetof Field", V2 
aadd VI, V2, V3 
{b,c,s,i,l,d,f,a)ld (V3) , V4 


SRC EXTRA EXTRA DEST 




aadd VI, V2, V3 
(b,c,s,i,l,d,f,aHd (V3) , V4 


SRC SRC EXTRA DEST 






SRC DEST 




nop 


SRC 



Figure 4: Example of VR.A annotations for local vari- 
ableii and class member variables accesses 

virtual register numbers to the physical machine regis- 
ters. The details of our com pile-time register allocation 
algorithm are presented in Section 2.1 . 

When designing the VRA annotations we opted for a 
format that was easy for the run- time system to decode 
so that processing the annotations would incur mini- 
mal overhead. The general VRA annotation formats 
include a byte-long header followed by a variable num- 
ber of bytes representing the virtual register numbers. 
The header indicates how the subsequent annotation 
bytes should be interpreted. In our first prototype, we 
did not try to optimize the space consumed by the an- 
notations, and thus we found that our annotations can 
double the size of the bytecode stream [15]. Another, 
potentially more significant problem is that of verifica- 
tion: to maintain security, a scheme is needed to verify 
the safety of the annotations in the class file, as mali- 
cious or incorrect annotations can lead to unsafe native 
code. We have not yet devised an algorithm for validat- 
ing VRA annotations. 

2.1 Compile-time Register Allocation 

In our annotation-generating compiler we implement a 
modified priori ty-bavsed graph-coloring algorithm. In a 
traditional Chai tin-style graph coloring algorithm [4, 5], 
an interference graph is pruned to decide the ordering in 
which live ranges are assigned to colors (and ultimately 
registers), A priority-based coloring algorithm [6] uses 
heuristics and cost analyses to determine the ordering 
of live ranges and guarantees that the most important 
live ranges are assigned colors first. Tn our compiler, 
variables (including method parameters, method local 
variables, class variables and compiler generated tempo- 
raries) are prioritized by their static reference counts, 



having references inside loops, no matter how deeply 
nested, counting as 10. After the generation of the Java 
TR. the compiler runs data-flow analyses and performs 
copy propagation and common sub-expre^ssion elimina- 
tion. At this point loop structures are also identified 
and static reference counts are calculated. The first 
step of our register allocator is to build a priority list 
of variables using this information. Tn case of match- 
ing static reference counts, the priority of a variable is 
dictated by the order in which it was declared in the 
code. As we want to keep the number of virtual reg- 
isters as small as possible, we assign the same virtual 
register number to variableis with non-conflicting live 
ranges. This is accomplished by building the interfer- 
ence graph which gives us information on conflicting live 
ranges. Using the information provided by the interfer- 
ence graph, the register (color) assignment algorithm 
picks variables from the priority list and assigns virtual 
register numbers (colors) to them, reusing lowest vir- 
tual register numbers or creating a new virtual register 
number in the case of conflicts. 

In our register allocation algorithm, when assigning 
virtual register numbers we associate each virtual reg- 
ister number with the Java type of the variable it is 
allocated to. and we do not allow, for example, a vir- 
tual register holding an integer to later be re- used to 
hold a floating-point value. This rejstriction, although 
it has the counter eflFect of increasing the number of 
virtual registers, also guarantees that the mapping of 
a virtual register to a physical register is fixed in the 
run-time compiler. Otherwise, the frequent re-mapping 
of virtual registers to physical registers to comply with 
variable types and machine register assignment restric- 
tions will conflict, with the virtual register priorities, 
leading to an increase in spills and lower performance. 

3 Annotation-Aware JIT (AJIT) 
Compilation System 

The rightmost portion of Figure 2 depicits our annota- 
tion aware JTT (AJTT) system. We modified the public 
domain JTT compiler system Kaffe [24] (version 0.9.2) 
to implement our annotation scheme. The changes con- 
centrated on a few number of files and consisted on the 
design of a new register allocator, modifications to the 
generation of Kaffe^s internal intermediate representa- 
tion, and changes to its SPARC code generator. Both 
the original and new functionality coexist in the sys- 
tem, allowing the processing of annotated methods and 
non-annotated methods within the same class file. 

As VR.A annotations are derived from translating 
bytecodes into a RTSC-like three address code, one won- 
ders whether they are general, flexible and helpful enough 
to produce optimized code for diff*erent target archi- 
tectures. We experimented with the Tntel architecture 
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in [15], and now with the SPARC architecture in this 
paper — two distinct, architectures (CTSC and RISC 
respectively). Our annotation scheme has proven to 
suffice the needs for generating code for these two plat- 
forms. As we experiment with other architectures our 
annotation types and formats will he refined accord- 
ingly- 

In our AJTT system, when a class method J sfirst 
called, the bytecode stream is read into a table buffer. 
Tf there is an annotation code attribute, the annota- 
tions are also read into an annotations table. Then 
the JFT compiler invoices the corresponding translation 
routine. The process of producing native code from an- 
notated Java bytecodes is done in a single pass over 
the bytecode stream. As each bytecode and its anno- 
tations bytes are read, the corresponding Kaffe TR op- 
eration(s) is (are) generated. The generated Kajfe [R, 
operation (or sequence of operations) depends on the 
information provided by the annotations. This j n^r- 
ni^tion_rnay_suggest that the bytecode translation be 
entirely skippeH, or tRat~?JDTrre~mTH^^eratibns be elimi- 
nated or simplified. Figure 5 shows a code example of 
how an iaload bytecode operation is translated using 
annotation information. 

The translated Kajfe TR operation operands are spec- 
ified by virtual register numbers, extracted from the an- 
notations bytes. Once the entire bytecode stream has 
be^n processed, SPARC native code is produced from 
the Kaffe TR, At this point, as each Kaffe TR operation 
is translated into native code, the register allocator is 
invoked to replace virtual register numbers with ma- 
chine registers. 

The run-tinie register allocator is a fast and eff*ec- 
tive algorithm that essentially maps each virtual reg- 
ister to a machine register, prioritizing the assignment 
of lower virtual register numbers. This guarantees that 
high priority values (program variables represented by 
lower virtual register numbers) have preference in the 
register assignment. When the number of physical reg- 
isters is exhausted, virtual registers are mapped to tem- 
poraries? on the stack. Tn the case of the SPAR.C ar- 
chitecture, the register allocator reserves four registers 
of each type (four of the global integer registers g4-g7 
and four of the floating point regivSters f28~f3l) for 
evaluating expressions that involve variables that are 
not mapped into machine registers. Tt uses local regis- 
ters 10-17 , global registers gl-g3, any unused input 
register i0-i5 and floating point registers f 0-f 27 dur- 
ing allocation. R^?gisters o0-o7 are not available for 
the allocator and are reserved for passing parameters 
to method calls. Our register allocation algorithm uses 
a mapping table as an auxiliary data structure. The 
mapping table stores information on a virtual register 
number, a pointer to the corresponding physical regis- 
ter table entry, and the stack offset value it should use 



in case of spilling. There are some details on the ini- 
tialization of the mapping table to correctly handle the 
SPAR.C register windows convention; these details are 
taken care of in the method's prologue and on the trans- 
lation of bytecodes for accessing method local variables. 
Method local variables that are parameters are passed 
in special integer registers (i0-i5), forcing the mapping 
of virtual registers associated with these parameters. 

Tn our experiments we observed that machine calling 
conventions can complicate the simple mapping-based 
register allocation, as it forces virtual register assign- 
ments to specific machine registers. This may break 
virtual register priorities, and the register allocator fixes 
it by spilling lower priority physical registers in case a 
higher priority virtual register needs a physical register 
and none are available. We are currently studying the 
efl^ect of diff'erent calling conventions in our mapping- 
based dynamic register allocator. 

Our current register allocation scheme does not try 
to minimize the cost of subroutine calls. At method call 
boundaries, move operations are generated to guaran- 
tee values are in the correct registers required by the 
calling convention and spilling of all active registers is 
done. Our annotation scheme could be used to carry in- 
formation on which values produced in the program are 
later passed to methods as parameters and also which 
registers should be saved across procediire calls. TTaving 
the first kind of information would guide the register al- 
locator in the virtual to physical register mapping and 
would avoid some copies. The second kind of informa- 
tion would decrease the overhead of subroutine calls by 
spilling only the registers that are later referenced in 
the program. We are currently investigating how our 
virtual register allocator in our annotation-generating 
front-end can be extended to lower the cast of method 
calls. 

To prove that our AJTT system is an acceptable en- 
gineering solution we need to quantify the overhead 
of processing the annotated bytecode stream and the 
overhead of our mapping-based register allocation in 
the process of generating optimized native code. Tf we 
generate better native code more efliiciently than tradi- 
tional optimizing JTT compilers, we have shown that our 
framework is a good solution for improving the overall 
execution speed of Java programs. Annotation over- 
head results from many factors: (1) the larger class file 
size (which increasei? download time). (2) the interpre- 
tation of the information conveyed in the annotation 
bytes (see the extra processing required to build the 
Kaffe JTT TR in Figure 5), (3) additional work done by 
the run-time register allocator, and (4) the demand for 
extra resources (memory for storing annotations). Net- 
work applications are sensitive to the download time 
overhead, but other types of applications that do not 
depend on annotated class files being downloaded are 
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if (a. header » SRC SBC EXTRA. BXTRA. DBST) ( 

index - •(a.VftAOata); dbjref - « (a.VBAOata-t-l) ; 

txapl - *(a.VRADaCa't-2) ; tiiip2 - * (a.VRADaCa4-3) ; deat - • <a.VttAData-t-4) j 

annotated ishl int const (vrs Iocs (ttr^kl] .si ou, vrslots (Index] .slots, SHIFT jint); 
if (Object array offset !-0) 

annotated add. int const (vrs lots (Cnpl] .slots, vrslota(tispl] .slots, object azxay offset); 
anootated add ref (vrslots(tinp2) .slots, vr8lots((d>jref] .slots, vrslotB[Ciqpl] .slots) ; 
anaotatml load int (vrs lots (dest] .slots, vrslot«(tiiip2] .slots) ; 

}else if (a. header — CONST SRC extra EXTRA DBST) ( 
clndex - • (a.VRAConst) ; objref - « (a.VRAData) ; 

topi - *(a.VBA0ata4-l) ;tii|>2 - * (a.VRA0ata4-2) ; deat - * (a.VRAOata+3) ; 

annotated Bove int oonst(vrslotsCtii!pl) .slots, (oindax«8HXFT jiat), NULL); 
if (object array offset r-0) 

annotated add int const (vrslots(Citipl} .slots, vraIot8(tnpl] .slots, object array offset); 
annotated add ref (vra Lots rtiB|>2) .slots, vr6lots(<AjFef] .slots, vrslotsttopl] .slots) ; 
annotated load int (vralots (destj . slots, vrs lots (tnp21 .slots) ,* 

}else if (a. header — SRC SRC EXTRA DEST) { 

objref - *(a.VRAData) ; Utspl - ♦ (a.VRAData+l) ; txapZ - * (a.VRAData4-2} ; dest - * (a.VRAData+3} ; 

annotated add ref (vrslots(tmp2) .slots, vrslots [objref ] .slots, vrslots [topi] .slots) ; 
annotated load int {vrslots (dest) , slots, vrslots (t«¥»2] . slots) ; 

}else if (a. header — SRC DEST) ( 

tnq&l - *(a.VEAData) ; dest - * (a.VRAData+l) ; 

annotated load int (vrslots (dest) .slots, vrslots (tnplj .slots) ; 
)else if (a.headar — SRC) ( 
// no action 
) else error-1; 



Figure 5: AJTT translation process for an iaload bytecode operation 



not affected. Tn our A.TFT system, the f^ajfe nm-time 
TR is simple to build and manipulate. Other optimiz- 
ing JTT systems will need a more complex TR to enable 
more advanced compiler transformations. We believe 
that the overhead of processing the annotations, stor- 
ing them and bnilding a simple nm-time TR will ulti- 
mately be less than the overhead of building, storing 
and manipulating a complex TR in those systems. Fi- 
nally, our nm-time register allocation algorithm is an 
algorithm that obeys a defined mapping rule and only 
manipulates mapping tables. As a result, our regist.er 
allocator is simple and fast. No time is spent on con- 
flict graph construction, coloring nor dataflow analysis 
— tasks routinely performed by traditional register al- 
locators. 

4 Related Work 

Various approaches are being proposed to overcome the 
inefficiency of translating the Java FJytecodes to native 
code, and thus increase the execution speed of Java 
programs. When compilation time is not a constraint, 
the most common approach is to translate the byte- 
codes into some higher-level intermediate form [7, 14] 
or language [21], and then back to native code (perhaps 
using an existing compiler, as in [21]), When speed 
of compilation is an issue, optimizing JTT compilers 
[1 , 2, 16, 18; 24] try to improve the quality of the native 
code generated on the fly by adapting traditional opti- 
mization techniques to run-time code generation. Opti- 
mizations can also be applied during load-time, i.e. after 
bytecode generation yet before run- time translation to 
native code; [8] is an example of such a bytecode opti- 
mizer. Our annotation scheme is a hybrid approach in 



that most work is done at compile-time to retain impor- 
tant high-level program and optimization information, 
while at run-time lightweight code-improving transfor- 
mations accomplish the task of generating high-quality 
native code. 

R,esearch in the area of developing fast run-time algo- 
rithms for traditional compiler optimizations is very ac- 
tive. Tn the following paragraphs we overview commer- 
cial and academic systems, some of which make use of 
annotation schemers to aid code optimization. We also 
discuss how they implement run-time code optimiza- 
tions such as common sub-expression elimination, regis- 
ter allocation and elimination of array bounds checking, 
and how these implementations rxDmpare jto the run- 
time algorithms our annotation scheme requires. Tn all 
optimizing JTT compilers there is an. attempt to develop 
compiler optimizations with linear time algorithms with 
respect to some parameter (e.g., the number of bytecode 
instructions, or the number of local or stack variablei?). 
Our annotation-based approach has also been designed 
with this in mind; our VR,A annotation scheme allows 
run-time register allocation in linear time. 

Several researchers exploit the idea of code annota- 
tions and relate to our approach. Though not designed 
to specifically overcome the Java Bytecode language in- 
efficiency, these approaches could potentially be applied 
to this problem. Tn the context of dynamic code genera- 
tion, code annotations in the form of programmer hints 
[12] or high-level language constructs extensions [20] 
serve as guide to where (and on what) dynamic compi- 
lation should take place. These code annotations help 
to build optimizing just-in-time compilers by extend- 
ing to run-time the applicability of traditional compiler 
optimizations. Using these schemes researchers have 



built differenti algoritihms for copy propagation, dead 
code elimination, register allocation and even advanced 
cross-module optimizations. Different strategies are ap- 
plied to balance the tradeoff between dynamic compila- 
tion speed and the quality of the generated code. 

Most directly related to oiir VR.A annotation scheme 
is the work of Wall [23] on cross-module I ink- time reg- 
ister allocation. Tn his approach, link-time register allo- 
cation is treated as a form of relocation. The compiler 
generates code that can be direx:tly linked and executed, 
but it annotates some of the instructions with register 
actions that describe what needs to be done to the in- 
struction if the variables it manipulates are assigned to 
a register at link time. Compared to our mapping-based 
register allocation. Wall's approach has the overhead of 
building the call graph and carrying out local data flow 
analysis at link-time, and it depends on good usage es- 
timates (profiling information). However, it performs 
global register allocation while our current implemen- 
tation only works intraprocedu rally. 

The Intel Java JfT compiler described in [2] imple- 
ments a limited form of common sub-expression elim- 
ination (CSFi). Our VR.A annotation scheme allows a 
traditional CSE algorithm to be implemented at com pile- 
time and has the further advantage in revealing com- 
mon sub-expressions implicit in bytecode operations. 
Tn the Intel JIT compiler, register allocation is accom- 
plished via a priority-based algorithm. Our mapping- 
based register allocation is also a priority-based scheme, 
but faster to implement at run-time as it dispenses with 
any form of code analysis. In addition, our VR.A scheme 
can be expanded along the ideas presented in [23] to 
allocate global variables, while the Intel JIT compiler 
would need interprocedural data-flow analysis to accom- 
plish the same, implying in an expensive run-time al- 
gorithm. A very simple array bounds check elimination 
algorithm was implemented in the Intel JIT compiler, 
handling only constant indexes. As described in [15], 
our run-time check annotations allow powerful vsubscript 
analysis to be performed at compile-time and easily con- 
vey this analysis information to the run-time system. 

Another efficient JIT compilation system is CACAO 
[1]. CACAO implements copy propagation and register 
allocation by performing stack analysis at run-time and 
relying on the efficient coloring of local variables done 
by the Java front-end (by assigning the same local vari- 
able number to variables which are not active at the 
same time). CACAO also relies on the fact that stack 
slot variables have their lifetimes implicitly encodexl. 
Compared to our scheme, the stack analy.sis information 
that has to be computed by their algorithm is provided 
for free by our VR.A annotations. On the other hand, 
their run-time register allocator takes into account the 
cost of subroutine calls, which is lacking in our current 
scheme. If allocation of global variables is considered, 



interprocedural stack analysis would be necessary, and 
their algorithm would become more expensive. Other 
optimizations such as instruction scheduling, method 
inlining and array bounds check removal are planned 
for CACAO, but the run-time cost of these additions is 
not clear. 

Kaffe [24] is a freely available JVM that runs on 
several platforms; it serves as the basis for our imple- 
mentation work. Kajfe*^ native code translation pro- 
cess builds a simple RISC-like IR as it loops through 
the bytecode stream. R.egister allocation is combined 
with machine code generation. The register allocator is 
based on a simple algorithm that maps stack and local 
variable slots to machine registers. When it runs out of 
registers, the least recently used register is spilled and 
freed for allocation. There is no special treatment to 
reduce subroutine calling costs, or to exploit machine 
calling conventions, as CACAO does. Upon a call, copy 
operations are introduced to guarantee values are in 
the correct register and all modified slots are spilled. 
No other compiler optimizations are implemented. In 
Section 5 we demonstrate that our AJIT system out- 
performs Kaffe in terms of the quality of the generated 
native code. 

An important optimizing run-time compilation sys- 
tem is the Slim Binary project [10, 19). This approach 
proposes an architecture-neutral intermediate represen- 
tation for software distribution, called slim, binaries^ 
that can be seen as an alternative to Java Bytecodes. 
The dynamic compiler for slim binaries implements code 
optimizations as background processes. Just like the 
dynamic compilation systems discussed in [12. 20], this 
system tries to utilize run-time information (e.g., values 
of variables and run-time profiling information) to per- 
form customized optimizations. Slim binaries incorpo- 
rate a more complex tree-based intermediate represen- 
tation, conveying control flow information but also in- 
curring some run-time overhead to manipulate it. Much 
like our annotation scheme extends the Java Bytecodes 
with extra information that is collected during tradi- 
tional compilation, the Slim Binary representation could 
benefit from our annotations scheme to decrease run- 
time optimization costs, such as carrying extra infor- 
mation to aid in register allocation. 

5 Results 

Our results revolve around four benchmarks: Neighbor, 
which performs a nearest-neighbor averaging across all 
elements of a two-dimensional array; EM3D, a code that 
creates a graph and then performs a 30 electromag- 
netic simulation [9]; Huffman, a character string com- 
pression and decompression application; and Bitonic 
Sort, which builds a binary tree and then performs 
bitonic sorting (recursively) [3]. To measure the impact 



of our AJTT system, we collected results using JVMs 
available on the SPAR.C platform: Sun's JDK version 
1 .1 .1 [17] and Kaffe JVM version 0.9.2 [24], The execu- 
tion time results are shown in Table 1 , Note that the 
timings do not include translation nor compile time, and 
thus represent the quality of the generated code. All 
codes were compiled using our an notation -generating 
Java Oytecode compiler and then executed using Sun's 
interpreter, the Kaffe JTT compiler, and our AJTT sys- 
tem. 



Benchmarks 


SUN 

Interpreter 

(in sees) 


kaffe 
JIT 

(in sees) 


AJIT 

(in sees) 


Neighbor 

2R6X256 array 

Iterations = 1500 


553.03 


162.73 


115.31 


EH3D 

1 250 tree nodes 
IteratioTi.s =200 


359.84 


149.86 


74.51 


Bitonic Sort 
1 024 tree naden 
Iterations =512 


167.05 


141.23 


120.96 


Huffman 

.^0000 array nodes 
Iterations = 28fi 


4690.00 


1856.00 


1487.00 



Table 1: Benchmarks execution times (in seconds) 



Denchmarks 


SpeedUp 


SpeedUp 




A.TIT/SUN 


AJIT/Kaffe 


Neighbor 


4.80 


1.41 


EH3D 


4.83 


2.01 


Bitonic Sort 


1.38 


1.17 


Huffman 


3.15 


1 .25 



Table 2: Benchmarks speed ups 



The results presented in Table 1 reflect the sole ef- 
fect of our VRA annotations scheme. FYom the two 
speedup columns of Table 2 we see that our annota- 
tion based approach offers speed ups varying from 1.38 
to 4.83 over direct interpretation, and is 17% to 100% 
faster than Kaffe'^ JIT technology. Also notice that 
the best speed ups were achieved for codes consisting of 
basic loops iterating over array-based or pointer-based 
data (Neighbor. EM3D and Huffman). For such codes, 
the VRA annotations helped to identify common subex- 
pressions and eliminate them, along with the propa- 
gation of values and elimination of move operations. 
These transformations correspond to optimizations that 
conid not be expressed in the Java Bytecodes directly. 
The annotations also ensured that the most important 



variables, such as loop index variables, were perma- 
nently assigned to machine registers throughout method 
execution. The smallest performance gain was observed 
for the code with the highest number of subroutine calls 
— Bitonic Sort, a recursive algorithm. This result is 
explained by the way our AJTT system, and Kaffe as 
well, handle subroutine calls during dynamic register 
allocation. Tn short, both JIT compilers do not take 
advantage of SPARC register windows. All active reg- 
isters are saved across method calls, introducing signif- 
icant overhead. Thus, it is important to note that this 
overhead is not an intrinsic limitation of the algorithms, 
but an artifact of the current implementations. 

The encouraging observation we have obtained from 
these preliminary results is that despite many limita- 
tions of the first implementation, our AJIT system is 
capable of producing machine code that executes up to 
twice as fast as current JIT technology. By extending 
our VRA annotations scheme with extra information, 
such as registers to be saved across subroutine calls, 
and improving the implementation of the dynamic reg- 
ister allocator itself, we believe the impact of our VRA 
annotations will be even more significant. 

6 Conclusions and Future Work 

Most approaches for speeding up Java execution resort 
to dynamic compilation (and even dynamic code re- 
optimization [13]). Tn this scenario, run-time costs must 
be minimized and thus it is desirable that the bulk of 
the compilation process be done statically at compile 
time. Having a rich program representation conveying, 
for example, dependence information to allow instruc- 
tion scheduling and support for dynamic regist^er allo- 
cation, will decrease the time spent on run-time code 
generation by cutting down the time spent on program 
analysis and transformation. Tn this paper we discussed 
how the Java Bytecode language is a poor choice for a 
high-performance program representation, since it de- 
mands a more time consuming code generation process 
(at run-time!) in order to produce high-quality native 
code. We presented an approach based on code annota- 
tions that helps overcome this problem, and discussed 
the implementation details of our re^sulting annotation- 
aware JTT system. 

Our first prototype implements the VR.A annotation 
scheme that conveys information for dynamic register 
allocation. It also enables some basic code scheduling 
by identifying and eliminating redundant computation 
and allowing propagation of values. Preliminary results 
show that we outperform JTT technology, producing 
code that runs up to twice as fast. We plan to extend 
our VR.A annotation scheme by incorporating informa- 
tion that helps minimize the east of subroutine calls 
(e.g.. values to be saved across procedure calls and val- 



ues passed as subroutine parameters) and allows cross- 
module register allocation. We started with the im- 
plementation of the VR.A annotations scheme because 
register allocation is the most important compiler opti- 
mization on today's architectures. We also initially se- 
lected scientific benchmarks to test our approach given 
their higher sensitivity to such optimization. Figure 2 
mentions a number of annotation possibilities that we 
plan to explore in the future. These annotations sup- 
port more sophisticated compiler optimizationSj such as 
instruction scheduling and lifetime analysis for reducing 
garbage collection. To help evaluate these annotations 
we will study non-numeric Java benchmarks as well. 
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